WHU-Mix (raster) building dataset

The WHU-Mix (raster) dataset is a diverse, large-scale, and high-quality dataset that aims to better simulate the situation of practical building extraction, to measure more reasonably the real performance of a deep learning model, and to evaluate more conveniently the generalization ability of a model on different remote sensing images acquired from different sources and places.

As listed in Table 1, the WHU-Mix (raster) dataset consists of two parts: a training/validation (trainval) set and a test set. The trainval set is a collection of data from two datasets, i.e., the WHU building dataset [1], and the Inria dataset [2] which we newly edited, and a great many newly acquired samples. The trainval set consists of 43,727 512 × 512 image tiles. We randomly selected 39,346 images (i.e., 90% of the trainval set) from the trainval set as the training set, and the 4,381 remaining images formed the validation set.

The test set is composed of 8,402 512 × 512 image tiles obtained from five cities on five continents (the original uncropped remote sensing images are also provided). In the test set, the data of Kitsap, Potsdam, and Khartoum are newly edited from existing datasets, i.e., Inria [2], Spacenet [3], and LAIS [4], respectively, and the data of Wuxi and Dunedin are newly acquired. There is no geographic overlap between the training and test sets in the WHU-Mix dataset.

The image tiles and corresponding building label maps are stored in TIFF format. The building label maps are Boolean maps, where the building pixels and backgrounds are denoted as 255 and 0, respectively. The images and labels of the training, validation and test sets are stored in corresponding folders and named with “[city]_train_[number]”, “[city]_val_[number]” and “[city]_[number]” rules, respectively.

Note: ‘*’ denotes the we edited the incorrect labels in these datasets.

[1] S. Ji, S. Wei, and M. Lu, "Fully Convolutional Networks for Multisource Building Extraction from an Open Aerial and Satellite Imagery Data Set," IEEE Transactions on Geoscience and Remote Sensing, vol. 57, pp. 574-586, 2019.

[2] E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, "Can semantic labeling methods generalize to any city? the inria aerial image labeling benchmark," in the IEEE International Geoscience and Remote Sensing Symposium, 2017, pp. 3226-3229.

[3] A. Van Etten, D. Lindenbaum, and T. M. Bacastow, "Spacenet: A remote sensing dataset and challenge series," arXiv preprint arXiv:1807.01232, 2018.

[4] P. Kaiser, J. D. Wegner, A. Lucchi, M. Jaggi, T. Hofmann, and K. Schindler, "Learning Aerial Image Segmentation from Online Maps," IEEE Transactions on Geoscience and Remote Sensing, vol. 55, pp. 6054-6068, 2017.